Amazon's Project Rainier: The World's Most Powerful AI Computer

If you're lucky enough to be in Seattle on a clear, sunny day, you might overhear a local say, 'the mountain is out.' They're referring to Mount Rainier, the 14,410-foot (4,392-meter) stratovolcano that towers above the surrounding terrain. This commanding presence inspired Amazon Web Services (AWS) to name one of its most ambitious projects after it: Project Rainier, a massive machine designed to usher in the next generation of artificial intelligence (AI).

Project Rainier, announced at the end of last year and now well underway, is a one-of-a-kind endeavor. It's a colossal computer designed to train AI models with unprecedented power and speed. Spread across multiple data centers in the U.S., the scale of the project is unlike anything AWS has ever attempted.

A key partner in this initiative is Anthropic, an AI safety and research company. Anthropic will use the new 'AI compute cluster' to build and deploy future versions of its leading AI model, Claude. 'Rainier will provide five times more computing power compared to Anthropic’s current largest training cluster,' said Gadi Hutt, director of product and customer engineering at Annapurna Labs, the specialist chips arm of AWS.

For a frontier model like Claude, the more compute you put into training it, the smarter and more accurate it will be. 'We’re building computational power at a scale that’s never been seen before and we’re doing it with unprecedented speed and agility,' Hutt added.

Project Rainier is designed as a 'EC2 UltraCluster of Trainium2 UltraServers.' EC2 refers to Amazon Elastic Compute Cloud, an AWS service that lets customers rent virtual computers in the cloud. The more interesting part is Trainium2, a custom-designed AWS computer chip built specifically for training AI systems. Unlike general-purpose chips, Trainium2 is specialized for processing the enormous amounts of data required to teach AI models how to complete all manner of different and increasingly complex tasks—fast.

A single Trainium2 chip can complete trillions of calculations a second. To put this in perspective, it would take one person more than 31,700 years to count to one trillion. A task that would take millennia for a human can be done in the blink of an eye with Trainium2.

Impressive, yes. But Project Rainier doesn’t just use one or even a few chips. This is where the UltraServers and UltraClusters come in. Traditionally, servers in a data center operate independently. When they need to share information, that data has to travel through external network switches, introducing latency. AWS’s solution is the UltraServer, which combines four physical Trainium2 servers, each with 16 Trainium2 chips. They communicate via specialized high-speed connections called 'NeuronLinks,' identifiable by their distinctive blue cables. These links allow data to move much faster within the system, significantly accelerating complex calculations across all 64 chips.

When you connect tens of thousands of these UltraServers and point them all at the same problem, you get Project Rainier—a mega 'UltraCluster.' This is also why Hutt affectionately refers to Rainier as a 'friendly giant.'

Communication between components happens at two critical levels: NeuronLinks provide high-bandwidth connections within UltraServers, while Elastic Fabric Adapter (EFA) networking technology (identified by its yellow cables) connects UltraServers inside and across data centers. This two-tier approach maximizes speed where it's most needed while maintaining the flexibility to scale across multiple data center buildings.

Operating and maintaining such an enormous computer is not without its challenges. To ensure all of that gigantic capacity is available to customers, reliability is paramount. AWS builds its own hardware, giving it control over every aspect of the technology stack, from a chip’s tiniest components to the software that runs on it, to the complete design of the data center itself.

This kind of vertical integration is one part of what gives AWS an advantage in the race to accelerate machine learning and reduce cost barriers to making AI more accessible. 'When you know the full picture, from the chip all the way to the software, to the servers themselves, then you can make optimizations where it makes the most sense,' said Annapurna director of engineering Rami Sinno.

'The team that engineers our data centers—from rack layouts to electrical distribution to cooling techniques—is continuously increasing energy efficiency,' said Hutt. 'Regardless of the scale AWS operates at, we always keep our sustainability goals front of mind.'

All of the electricity consumed by Amazon’s operations, including its data centers, was matched with 100% renewable energy resources in 2023. The company is investing billions of dollars in nuclear power and battery storage, and in financing large-scale renewable energy projects around the world to power its operations. In fact, for the past five years, Amazon has been the largest corporate purchaser of renewable energy in the world. The company is still on a path to be net-zero carbon by 2040, a goal that remains unchanged by the addition of Project Rainier and its continued worldwide growth.

Last year, AWS announced it would be rolling out new data center components that combine advances in power, cooling, and hardware, not only for data centers it’s currently building but also in existing facilities. New data center components are projected to reduce mechanical energy consumption by up to 46% and reduce embodied carbon in the concrete used by 35%.

Project Rainier is a testament to AWS's commitment to innovation, reliability, and sustainability, setting a new standard in the world of AI computing.

Amazon's Project Rainier: The World's Most Powerful AI Computer

Frequently Asked Questions

What is Project Rainier and its purpose?

How does Project Rainier differ from traditional data centers?

Who will benefit from Project Rainier?

What is the significance of Trainium2 chips in Project Rainier?

How is AWS ensuring the sustainability of Project Rainier?

Article Tags

Related News Articles

NAB Appoints Pete Steel as Group Executive for Digital, Data, and AI

Equity Urges Pact for AI Protections in UK Acting Industry

Mastering AI in Business: A 5-Step Playbook for IT Teams